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ON THE CONVERGENCE OF HIGHER-ORDER ORTHOGONALITY ITERATION 


YANGYANG XU^ 

Abstract. The higher-order orthogonality iteration (HOOI) has been popularly used for finding a best low-multilinear- 
rank approximation of a tensor. However, its iterate sequence convergence is still an open question. In this paper, we first 
analyze a greedy HOOI, which updates each factor matrix by selecting from the best candidates one that is closest to the 
current iterate. Assuming the existence of a block-nondegenerate limit point, we establish its global convergence through 
the so-called Kurdyka-Lojasiewicz (KL) property. In addition, we show that if the starting point is sufficiently close to any 
block-nondegenerate globally optimal solution, the greedy HOOI produces a sequence convergent to a globally optimal solution. 
Relating the iterate sequence by the original HOOI to that by the greedy HOOI, we then show that the same convergence 
results hold for the original HOOI and thus positively address the open question. 

Key words, higher-order orthogonality iteration (HOOI), global convergence, Kurdyka-Lojasiewicz (KL) property, greedy 
algorithm, block coordinate descent 


1. Introduction. It is shown in [4] that any tensor (i.e., multi-dimensional array) can be decomposed 
into the product of orthogonal matrices and an all-orthogonal core tensor. This decomposition generalizes the 
matrix SVD and is today commonly called higher-order singular value decomposition (HOSVD) or multilinear 
SVD. In applications, people are usually interested in seeking a low-multilinear-rank approximation of a given 
tensor, such as the multilinear subspace learning [18] and multilinear principal component analysis [17]. 
Unlike the matrix SVD, truncated HOSVD can give a good but not necessarily the best low-multilinear-rank 
approximation of the given tensor. To obtain a better approximation, people (e.g., [5,6,12]) solve the best 
rank-(ri,..., r^v) approximation problem 

min \\X — C Xi Ai. .. Xat A]vjj|;., s.t. A„ € Oj^xr^, Vn, (I.I) 

C,A 

where X € jg ^ given tensor, x„ denotes mode-n tensor-matrix multiplication (see the definition 

in (1.3) below), and 

Oi^xr„ = {A„ e : ATA„ = I}. 

With A fixed, the optimal core tensor is given by C = X Xi A^ ... x at AJ^. Absorbing this C into the 
objective, one can write (1.1) equivalently to (see [5, Theorem 3.1] for detailed derivation) 

max \\X Xi A7 ... x^r A^jjs.t. A„ G Oj xr , Vn. (1.2) 

One popular method for solving (1.2) is the higher-order orthogonality iteration (HOOI) (see Algorithm 
I). Although HOOI is commonly used and practically efhcient (already coded in the Matlab Tensor Toolbox 
[2] and Tensorlab [21]), existing works only show that the objective value of (1.2) at the generated iterates 
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increasingly converges to some value while the iterate sequence convergence is still an open question (c.f. [11, 
pp. 478]). The iterate sequence convergence (or equivalently the multilinear subspace convergence) is 
important because without convergence, running the algorithm to different numbers of iterations may give 
severely different multilinear subspaces, and that will ultimately affect the results of applications. In this 
paper, we address this open question. Our main results are summarized in the following theorem. 

Theorem 1.1 (Main Theorem). Let {A*}fc>i be the sequence generated by the HOOI method. We 
have: 

(i) . //{A^}fc>i has a block-nondegenerate (see Definition 1.2) limit point A, then A is a critical point 
and also a block-wise maximizer of (1.2). In addition, lim A*(A^)^ = A(A)^, where 

fc—>-oo 

AAT = (AiA7,...,AwAT). 

(ii) . If the starting point A° is sufficiently close to any block-nondegenerate local maximizer of (1.2), 
then the entire sequence {A*(A^)^}fc>i must converge to some point A(A)^ and A is a local maximizer of 
( 1 . 2 ). 

We make some remarks on the assumption and the convergence results. 

Remark 1.1. The block-nondegeneracy assumption is also necessary because even starting from a critical 
point A, the HOOI method can still deviate from A if it is not block-nondegenerate (see Remark 1.3), that 
is, a degenerate critical point is not stable (see [7] for the perturbation analysis). In practice, the block- 
nondegeneracy is always observed), and it is implied by lim inf^ (CTr„(G^) — (Tr„+i(G^)) > 0, Vn, where G* 
is defined by (1.6). 

The assumption is similar to the one assumed by the orthogonal iteration method [9, section 7.3.2] 
for computing r-dimensional dominant invariant subspace of a matrix X. Typically, the convergence of 
the orthogonal iteration method requires that there is a positive gap between the r-th and (r + l)-th largest 
eigenvalues of X in magnitude, because otherwise, the r-dimensional dominant invariant subspace of X is 
not unique. 

For a block-wise maximizer A, its block-nondegeneracy is equivalent to negative definiteness of each block 
Hessian over the Stiefel manifold Oj^xr,,- The definition of our block-nondegeneracy is different from the 
nondegeneracy in [10]. A nondegenerate local maximizer in ]10] is one local maximizer that has negative 
definite Hessian, so the nondegeneracy assumption in ]10] is strictly stronger than our block-nondegeneracy 
assumption. 

Remark 1.2. Since the solution to each subproblem (see (1-5)) of the HOOI method is not unique and 
actually still a solution after multiplying any orthogonal matrix to its right, we can only hope to establish 
convergence of the projection matrix sequence {A^(A*)^}fc>i instead o/{A^}fc>i itself. 

1.1. Basic concepts of tensor. Before proceeding with discussion, we first review some basic concepts 
about tensor that we use in this paper; see [11] for more review. 

^Here, we need to assume r„ < because S . 
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The {ii,... ,iAr)-th component of an iV-way tensor X is denoted as For X,y G 

their inner product is defined in the same way as that for matrices, i.e., 


jmi X ... X rriN 


mi m-iv 


= ^ a;,,. 

ii=l 


■ Vii. 


The Frobenius norm of X is defined as |lAr||F = ^ (AT, X). A fiber of AT is a vector obtained by fix¬ 
ing all indices of X except one. The mode-n matricization (also called unfolding) of X is denoted as 
unfold„(Ar), which is a matrix with columns being the mode-n fibers of X in the lexicographical order. 
The mode-n product of X G jgmix .xmjv -^^ith Y G is written as AT x„ Y which gives a tensor in 

]gmix .xm„_ixpxm„+ix .xmiv jg defined Component-wisely by 


(AT — 'y ( ' Ujin- 

i„ = l 

If AT = C X1 Ai ... X jv A^, then for any n, 

unfold„(Ar) = A„unfold„(C)(AAr ® ...® A„+i (g) A„_i ® ...® Ai)^, 

= A„unfold„(C xi Ai... x„_i A„_i x„+i A„+i ...xn An). 


(1.3) 


(1.4) 


1.2. Higher-order orthogonality iteration. The HOOI method updates A by maximizing the ob¬ 
jective of (1.2) alternatingly with respect to Ai, A 2 ,... ,An, one factor matrix at a time while the remaining 
ones are fixed. Specifically, assuming the iterate to be A^ at the beginning of the fc-th iteration, it performs 
the following update sequentially from n = 1 through N: 

A^+i e argmax ||A^G^|||., (1.5) 

A„eO/„xr„ 

where we have used (1.4), and 

= unfold„(Ar x,<„ {A’)+Y x,>„ (Af)^). (1.6) 

Any orthonormal basis of the dominant r„-dimensional left singular subspace of G^ is a solution of (1.5). 
The pseudocode of HOOI is given in Algorithm 1. 

It is easy to implement Algorithm 1 by simply setting A^+^ to the left r„ leading singular vectors of G^. 
This implementation is adopted in the Matlab Tensor Toolbox [2] and Tensorlab [21]. However, such choice 
of A(]+i causes difficulty to the convergence analysis of the HOOI method. While preparing this paper, 
we did not find any work that gives an iterate sequence convergence result of HOOI, except for our recent 
paper [22] that establishes subsequence convergence by assuming a strong condition on the entire iterate 
sequence. The essential difficulty is the non-uniqueness of the solution of (1.5), and the leading singular 
vectors are not uniquely determined either. 

To tackle this difficulty, we first analyze a greedy method, which always chooses one solution of (1.5) 
that is closest to A* as follows: 

A*+i G argminl|A„ - A^ll|., 
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Algorithm 1: Higher-order orthogonality iteration (HOOI) 

1 Input: X and (ri,... ,rjv) 

2 Initialization: choose (Aj,..., A^^) with A° £ Vn 

3 for fc = 0,..., do 

4 for n = 1,..., N do 

5 ^ Set A*"*"^ to an orthonormal basis of the dominant r„-dimensional left singular subspace of G*. 

6 if Some stopping criteria are met then 

7 ^ Output A = A*"*"^, C = A Xi Ai... Xn A.N and stop. 

where 

'Hn= argmax ||A^G^||^. (1.8) 

xm 

The pseudocode of the greedy implementation is shown in Algorithm 2. The subproblem in (1.7) can be 
solved by the method given in Remark 2.3. Although (1.7) can in general have multiple solutions, we will 
show that near any limit point of the iterate sequence, it must have a unique solution. With the greedy 
implementation, we are able to establish iterate sequence convergence of the greedy HOOI method (i.e.. 
Algorithm 2), as shown in sections 2 and 3. Through relating (see (4.2) and Figure 1.3) the two iterate 
sequences generated by the original (i.e.. Algorithm 1) and greedy HOOI methods, we then establish the 
iterate sequence convergence of the original HOOI method, as shown in section 4. 

Algorithm 2: Greedy higher-order orthogonality iteration (Greedy-HOOI) 

1 Input: X G ^nd (ri,..., rjv) 

2 Initialization: choose (A?,.. ., A.%,) with A° G Oi^xr^, Vn 

3 for fc = 0,..., do 

4 for n = 1,..., A do 

5 [ Set A(l+i by (1.7) 

6 if Some stopping criteria are met then 

7 ^ Output A = A*"*"^, C = A Xi Ai... Xjv Aiv and stop. 

1.3. Comparison to other methods. Besides the HOOI method, several other methods have been 
developed for solving the low-multilinear-rank tensor approximation problem. One of the earliest methods, 
called TUGKALS3, was proposed in [12]. TUGKALS3 also sequentially updates Ai through Am and then 
cycles the process, but different from HOOI, it obtains approximate leading left singular vectors of G* by 
carrying out only one step of the so-called Bauer-Rutishauser method [20] starting from A*. This update is 
equivalent to solving a linearized version of the subproblem (1.5), and it prevents being far away from 

A^. Subsequence convergence of TUGKALS3 was established under the assumption that (A^)^G* (G^)^A* 
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is positive definite for all n and k. Although TUCKALS3 has slightly lower per-iteration complexity than 
HOOI, it does not converge as fast as HOOI as demonstrated in Figure 1.3. 

Recently, some Newton-type methods on manifolds were developed for the low-multilinear-rank tensor 
approximation problem such as the Newton-Grassmann method in [6] and the Riemannian trust region 
scheme in [10]. These methods usually exhibit superlinear convergence. Numerical experiments in [10] 
demonstrate that for small-size problems, the Riemannian trust region scheme can take much fewer iterations 
and also less time than the HOOI method to reach a high-level accuracy based on the gradient information. 
However, for medium-size or large-size problems, or if only medium-level accuracy is required, the HOOI 
method is superior over the Riemannian trust region scheme and also several other Newton-type methods. 

Under negative definiteness assumption on the Hessian of a local maximizer, the Newton-type methods 
are guaranteed to have superlinear or even quadratic local convergence (c.f. [10]). Compared to our block- 
nondegeneracy assumption, their assumption is strictly stronger because as mentioned in Remark 1.1, for 
a local maximizer, its block-nondegeneracy is equivalent to the negative definiteness of each block Hessian. 
Only with block-nondegeneracy assumption, it is not clear how to show the local convergence of the Newton- 
type methods. 

1.4. Contributions. We summarize our contributions as follows. 

- We propose a greedy HOOI method. For each update, we select from the best candidates one 
that is closest to the current iterate. With the greedy implementation, we show that any block- 
nondegenerate limit point is a critical point and also a block-wise maximizer, and if a block- 
nondegenerate limit point exists, then the entire iterate sequence converges to this limit point. 

- Through relating the iterates by the original HOOI method to those by the greedy HOOI method, we 
— for the first time — establish global convergence to a critical point by assuming the existence of 
a block-nondegenerate limit point and local convergence to a local maximizer by assuming sufficient 
closeness of the starting point to a block-nondegenerate local maximizer. 

- As a result, we show that the iterate sequence converges to a globally optimal solution, if the starting 
point is sufficiently close to any block-nondegenerate globally optimal solution. 

1.5. Notation and outline. We use bold capital letters X, Y, ... to denote matrices, caligraphic 

letters S,U,. .. for (set-valued) mappings, and bold caligraphic letters X,y,... for tensors. I denotes an 
identity matrix, whose size is clear from the context. The z-th largest singular value of a matrix X is denoted 
by (Ti(X). The set of all orthonormal matrices in is denoted as Omxr = {X G : X^X = I}. 

Throughout the paper, we focus on real field, but our analysis can be directly extended to complex field. 

Definition 1.2 (block-nondegeneracy). A feasible solution A of (1.2) is block-nondegenerate if ar„{Gn) > 
CTr„+i(G„), Vn, where 


G„ = unfold„(Ar AJ ). 
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(1.9) 



Synthetic data 


Yale Face Database B 






Fig. 1.1. Comparison of HOOI and TUCKALS3 [12] on a randomly generated tensor of 50x50x50 with core size 5x5x5 

and the Yale Face Database B [8, 15] of size 38 X 64 X 2958 with core size 5 X 5 X 20. All three methods start from the same 

||A^(A^)'r-A^+bA^+B'^llF 


point, which is given by truncated HOSVD. The subspace relative change is calculated by 




l|Aj;{AS)T||p 

and it measures how far the current iterate deviates from satisfying the first-order optimality conditions. The results show that 


the original HOOI and the greedy HOOI produce the same multilinear subspace at each iteration. They converge faster than 
TUCKALS3 on both synthetic data and the face image database. 


Remark 1.3. In general, we are only able to elaim convergence with existence of a block-nondegenerate 
limit point. The original HOOI method can deviate from a critical point if it is block-nondegenerate. To 
see this, suppose A is a block-wise maximizer and thus a critical point. Assume crri(Gi) = crri+i(Gi). 
Let the original HOOI method start from A and update the first factor to Ai. Then Ai may not span the 
same subspace as that by Ai because Gi has more than one dominant ri-dimensional left singular subspaces. 
Therefore, we cannot guarantee the convergence of the learned multilinear subspace. 

The rest of the paper is organized as follows. Section 2 shows subsequence convergence of the greedy 
HOOI. In section 3, global convergence of the greedy HOOI is established under the assumption of the 
existence of a block-nondegenerate limit point. The convergence of the original HOOI is shown in section 4. 
Finally, section 5 concludes the paper. 

2. Subsequence convergence. In this section, we show the subsequence convergence of Algorithm 
2, namely, the criticality on the limit point of the iterates. If A is a critical point of (1.2), then letting 
C = A Xi Ai... Xv Atv, we have (C, A) to be a critical point of (1.1). Therefore, our analysis will only 
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focus on (1.2). 

2.1. First-order optimality conditions. The Lagrangian function of (1.2) is 

1 1 ^ 

C{A,A) = -\\Xx,Aj ...XN AT III - - ^(A„, A|A„ - I), 

n—1 

where A = (Ai,..., Ajv) is the Lagrangian multiplier. The KKT conditions or first-order optimality condi¬ 
tions of (1.2) can be derived by V£ = 0, namely, 

Gr„G|A„ — A„A„ = 0, Vn, (2.1a) 

AlAn-l = 0,yn, (2.1b) 

where G„ is defined in (1.9). From (2.1), we have A„ = A|G„G|A„. Hence, the condition in (2.1a) can 
be written to 

G„G|A„ = A„A|G„G|A„, Vn, (2.1c) 

We say a point A is a critical point of (1.2) if it satishes the conditions in (2.1b) and (2.1c). 

The following result is well known, and we will use it several times in our convergence analysis. 

Lemma 2.1 (von Neumann’s Trace Inequality [19]). For any matrices X, Y € it holds that 

min(m,p) 

|(X,Y)|< ^ a.(X)a.(Y). (2.2) 

i=l 

The inequality (2.2) holds with equality ifX. and Y have the same left and right singular vectors. 

2.2. Properties of the solution of (1.5). To show the convergence of Algorithm 2, we analyze the 
solution of the subproblem (1.5), which can be written in the following general form: 

min ||Z-X|||, (2.3) 

where X G Omxr and Y G are given, and 

•Hy =argmax||ZTY|||. (2.4) 

zeo^x.- 

Definition 2.2 (Quotient set of left leading singular vectors). Given a matrix Y G and positive 

integer r < min(m,p), define 

B{Y,r) = {U G Ojnxr ■ span(U) is a dominant r-dimensional left singular subspace ofY}. 

For any Ui,U 2 G B{Y,r), i/span(Ui) = span(U 2 ), i.e., they span the same subspace, we say they are 
equivalent. By this equivalence relation, we partition B(Y,r) to a set of equivalence classes and form a 
quotient set denoted as U{Y,r). 

Remark 2.1. Throughout the paper, we regard UiY,r) as the finite set of orthonormal matrices, and 
each of its elements is a representative of the bases that span the same subspace. If ar{Y) > ar+i{Y), 
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then Y has a unique dominant r-dimensional left singular subspace, and U (Y, r) is a singleton. However, 
if ar{Y) = CTr+i(Y), then Y has multiple dominant r-dimensional left singular subspaces, andUiY,r) has 
more than one element. 

Proposition 2.3. The problem (2.3) has a unique solution if the following two conditions hold: 

1. //U* € argmaxug;./(Y,r) l|U^X||*, then UjX is nonsingular; 

2. For any \J eU{Y,r), if IJ ^ U*, then HU^XH* < ||UjX||,; 

where || • ||* denotes matrix nuclear norm, defined as the sum of all singular values of a matrix. 

Proof. Assume Z and Z are both solutions of (2.3). Note that Hy in (2.4) is exactly the set B{Y,r). 
Hence, Z = and Z = UiWz for Uj, Ui G ld{Y, r) and some Wj, Wj G Orxr- Note 

IIZ - X|||. = 2r - 2(Z, X) = 2r - 2{W,, UjX). 

Then by Lemma 2.1 and the optimality of Z on solving (2.3), we have 

r 

(W,,UTx)=^a,(UjX) = ||UTx|U= max ||UTx||,. (2.5) 

UGw(Y,r) 

i—1 

Hence, from items 1 and 2, it follows that Uj = U», and similarly = U*. 

Let UjX = USyT be the full SVD of UjX and = WjU, so = UVj. Then from (2.5), it 
holds that 

^a,(U:X) = (W^.UjX) = (VT,SVT) = (VTv,S) = ^ a,(U:X)(VTV),,. 

Note that ct*(UTX) > 0 and (VjV),, < 1. The equality ELiCTi(U7X) = ELi'^*(U7X)(VjV),, holds 
only if (VjV)ii = 1. Since VjV is orthogonal, we must have VjV = I. Hence, = V and Wj = UV^. 
For the same reason, Wj = UV^. Therefore, Z = Z, and the solution of (2.3) is unique. □ 

Remark 2.2. It is easy to see that the two conditions in items 1 and 2 are also necessary for uniqueness 
of the solution of (2.3). Define 

5(Y,r) = {X G Omxr ■ X satisfies the two conditions in Proposition 2.3}. 

Then for any X G 5(Y,r), (2.3) has a unique solution, which we denote as rY,r(X). In this way, Ty^r 
defines a mapping on S{Y,r). 

Remark 2.3. The proof of Proposition 2.3 provides a way for finding a solution of (2.3). Find U* G 
argmaxug;^(Y r) l|U^X||* and get full SVD o/UjX = USV^. Then Z* = U*UV^ is a solution of (2.3). 
Using Proposition 2.3, one can easily show the following two corollaries. 

Corollary 2.4. //X is sufficiently close to one U in B{Y,r), then the solution of (2.3) is unique. 
Corollary 2.5. IfiK.G B{Y,r), then rY,r(X) = X, i.e., X. is a fixed point. 

Furthermore, we can show the continuity of Ty^r- 
Theorem 2.6. The mapping Ty^r is continuous on S{Y,r). 



Proof. For convenience of the description, in this proof, we simply write U(Y,r),S{Y,r) and Tv.r to 
U,S and T, respectively. 

For any X € 5, let Z = T(X). If T is not continuous at X, then there exists e > 0 and a sequence 
{X^}fc>i in S such that ||X — X^||/7’ < ^ and ||Z — Z’^Wp > e, where Z^ = r(X^). By the definition of S, we 
know that there is U G such that |jU^X||* > |jU^X||* for any U G Similarly, there is a sequence 

{U'=}fc>i in such that for each A:, ||(U'')^X'=||* > |!U^X'=||* for any tj G f^\{U'=}. 

Let S = ||U^X||* — niaxug;^\{u} |iU^X||* > 0. There is a sufficiently large integer ko such that for 
all k > ko, it holds liU^X'^IU > ||U^X||, - f and ||(U'=)TX'=|U < |j(U'=)^X||, + f. Note ||U^X'=||, < 
|l(Ufc)Txfc||^. Hence, |1U^X||* - | < ||(U'=)TX||* + f, i.e., ||U^X||* < ||(U'=)^X||* + f. Therefore, by the 
definition of 6, it must hold that U* = U, Vfc > feg. 

Hence, we can write Z = UW^ and Z^ = UWj,fc for all k > ko, where 'Wz,'Wzk G Orxr- Note 
U^X'' ^ U^X as A: —> oo. Then from the proof of Proposition 2.3, we have and thus Z^ —)■ Z 

as A; —)■ oo. This contradicts to |jZ — Z^\\f > e. Therefore, T is continuous at X. Since X is an arbitrary 
point in S, this completes the proof. □ 

One can also show the following result. 

Theorem 2.7. Assume crr(Y) > (Jr+i(Y) and Y^ —>■ Y as k —>■ oo. //X G S{Y,r), then there is a 
sufficiently large integer ko such that X G 5(Y^,r) for all k > ko, and 

lim T^k ^(X) = TY,r(X). (2.6) 

k—^oc) ’ ’ 


Proof By the assumption CTr(Y) > (Tr+i(Y), UiY,r) is a singleton. Let U G U{Y,r). Then from 
X G 5(Y,r), it follows that X^U is nonsingular. Since Y^ —)• Y as A; —>• oo, there exists an integer ko, such 
that (Jr{Y^) > CTr+i(Y*), i.e., U{Y^,p) is a singleton for all k > ko- Let G U{Y*^,r), VA:. We can choose 
the representative satisfying > U, since Y^ —> Y. Therefore, taking another larger ko if necessary, 
we have that X^U* is nonsingular and thus X G 5(Y*,r) for all k > ko. Finally, using Remark 2.3 and 

^ U, we have (2.6) and complete the proof. □ 

2.3. Subsequence convergence result. We also need the following result. 

Lemma 2.8. For any feasible solution A, if ,,^(A„) = A„, Vn, then A is a critical point and also 
a block-wise maximizer of (1.2), where 

Gn = unfold„(Ar A^). (2.7) 


Proof. Note that Tq^ ^^(A„) = A„, Vn implies that A„ is a basis of the dominant r„-dimensional left 
singular subspace of G„. Hence, A„A)[G„G)(A„ = G„G(J(A„, Vn. Therefore, A is a critical point. 

In addition, ^^(A„) = A„, Vn implies that A„ is a solution to maxA„ ||A)(G„|||. over G/„xr„ for 
all n. Hence, A is a block-wise maximizer. This completes the proof. □ 
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Now we are ready to show the subsequence convergence result. 

Theorem 2.9 (Subsequence convergence). Let {A*}fc>i be the sequence generated from Algorithm 2. 
Then any block-nondegenerate limit point o/{A*}fc>i is a critical point and a block-wise maximizer of (1.2). 

Proof. Suppose that A is one block-nondegenerate limit point and the subsequence {A^}k^jc converges 
to A. From the update rule in (1.5), it is easy to see 

||(A(;+1)TG^|||. < \\AlG4%, Vfc, Vn. (2.8) 


We claim that A 1 is a solution of maxAieo^^11-^7Gli|||.. Otherwise, |1 A7 Gi|||. < erf (Gi). Note 


lim 

KBk—¥oo 


i(A^ 


fc+1 \T f^k\\2 


= 5]uf(Gf)=^af(Gi), 

/C9fc—>-oo 


which contradicts to (2.8). Hence, Tq^ ,,^(Ai) = Ai. 

Note that G* —)• Gi as /C 9 A: —>■ oo and A* e 5(Gf,ri) as k G K. is sufficiently large. From the 
block-nondegeneracy of A and Theorems 2.6 and 2.7, we have 


lim At+i = 

>-oo 


lim TG.,,,(At)=TG^,,^(Ai) = Ai. 

/C9k—>- oo 


(2.9) 


Hence, taking a sufficiently large fc € /C, we can make ||Af+^ — A*||p sufficiently small, and thus we can 
repeat the above arguments for n = 2,..., A to conclude 


A„ e argmax ||A)(G„|||,, Vn. 

Xrn 

Therefore, from the definition of Tq^ it holds that Tq^ ,,^(A„) = A„, Vn, and A is a critical point and 
a block-wise maximizer of (1.2) from Lemma 2.8. □ 

Remark 2.4. The result in (2.9) is a key step to have the subsequence convergence. In general, without 
the block-nondegeneracy assumption, it may not hold. 


3. Global sequence convergence. In this section, we assume the existence of one block-nondegenerate 
limit point and show global convergence of Algorithm 2. The key tool we use is the so-called Kurdyka- 
Lojasiewicz (KL) property (see Definition 3.3 below). 


3.1. Equivalent unconstrained problem. Let /(A) = \\A Xi A^ ... A)(^|||. and 


gn{A.n) — 


O7 11 Ayj G O 

-boo, otherwise 


be the indicator function on Oi^xr„ for n = 1,..., A. Also let 


N 

F(A) = /(A)-^g„(A„). 

n—1 

Then (1.2) is equivalent to maxAT(A), and A is a critical point of (1.2) if and only if 0 G dF{A), where 
dF denotes the limiting Frechet subdifferential (see [13] for example). 


10 



3.2. Bounding iterate distance by objective progress. We show the global convergence of Al¬ 
gorithm 2 also by analyzing the solution of the subproblem (1.5). As shown below, if there is a positive 
gap between and crr„+i(G^), the distance between A* and can be bounded by the objective 

difference. 


Theorem 3.1. Given X e Omxr andY G any solution Z of (2.3) satisfies 

a,(Y) - ct,+i(Y) _ ^ 2 < ||zTy||2 _ iixTyII^ 


Proof. Note Z = UW^ for some U G U{Y, r) and G Orxr- Let Y = USV^ -hUj_S_LV;[ be the full 
SVD of Y. Also, let W = U^X and W_l = Vj^X. Then X = UW -h U_lWj_ and W^W -t W];W_l = I 
from X^X = I. 

As in the proof of Proposition 2.3, we have 

r 

||ZTy||| = ^u2(Y) (3.2) 

and 

r 

IIZ - X|i|. = 2r - 2(Z, X) = 2r - 2(W„ W) = 2r - 2^ a,(W), (3.3) 

i=l 

where the last equality is from Lemma 2.1 and the optimality of Z for (2.3). Also, note that 

IIXTyIII = IIW^SIII. -f- (3.4) 

Assume = USV^ to be the full SVD of W^. Then 

W^W = I - WXW_L = V(I - S^S)V^. 

Let (Ti > (72 > • ■ • > di- be the first r largest singular values of Wj_. Then (Ti(W) = yL-dLi+i, 
using Lemma 2.1 again, we have 

r 

\\W^n\% = (WW^,s2) < ^(l-d2)uy,+i(Y), (3.5) 

and 

r 

IIWlS^lP^ = (W^Wl.S^Sl) (3.6) 

Hence, from (3.2) and (3.4) through (3.6), we have 

r 

IIZ^Yf^ - IIX^YIII. =Y.<jUY) - IIW^SIII - IIWlS^lll. 

r 

r 

(3.7) 
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where the last inequality is from a'^(Y) — < tT^_j+i(Y) — a^_^_^{Y), \/i. Using the fact 1 — y/1 — x < 

X, \/x € [0,1], we have 

2r-2^a.(W)=2r-2^yr^<2^u2, 

and thus from (3.3), it follows that 

r 

\\Z-X\\l<2Y,al 

i=l 

Plugging the above inequality into (3.7), we have the desired result. □ 

Using Theorem 3.1, we show the following result. 

Lemma 3.2. Let {A^}fc>i be the sequence generated from Algorithm 2. Assume it has a block- 
nondegenerate limit point A. Then there is a constant a such that if A^ is sufficiently close to A, we 
have 

a||A'=+i - A'=||| < U(A'=+i) - U(A'=). 


Proof. It is easy to see that there exists a small positive number S such that if || A — A||i? < i5, then 

^ ^Vn+li^n^) ~ ^ 0, Vu, 

where the strict inequality is from the block-nondegeneracy of A. Assume A* is sufficiently close to A such 
that 

N TTTTTT 

" lA'^-Allp < ,5. 


E 


/2(F(A) - F{A^)) 


From Theorem 3.1, it follows that 


^IIA^+i - AfWl < ||(A^i)^Gjf^ - ||(A^)^Gt|||. < F{A) - F{A'^), 


where G^ is defined in (1.6), and we have used (2.8). Hence, HA^'*'^ — Aj||f < and 

||(A^+i, A^i) - All;. < ||A^+i - A^IIf + IIA'^ - A|i;. < <5. 

Repeating the above arguments, in general, we have for all n that 


^fe+l _ J^k 


If < 


'2(U(A) -F(A'=)) 


and 


|(A|+^ - A||^ < ^ IIA^^ - Aj 


|A'=-A|1;.<5. 


2=1 


Therefore, every intermediate point (A<j)^, A>„) is in A/’(A, 5) = {A : || A — AHf < 5}, and thus for all n, 

^liA^+i - A>f\\l < \\{Af+YGtfF - UAYgYf- 

Let a = min„ ^ > 0. Summing the above inequality from n = 1 to A gives the desired result. C 
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3.3. Global sequence convergence result. Using Lemma 3.2 and the KL property of F, we show 
the global convergence of Algorithm 2. 

Definition 3.3 (KL property). A function i/’(x) satisfies the KL property at point x € dom(di/’) if 
there exists 6 € [0,1) such that 

IV>(x) -i/’(x)|^ 
dist(0, 3'0(x)) 

is bounded around x under the notational conventions: 0° = l,oo/oo = 0/0 = 0. In other words, in a certain 
neighborhood N of x, there exists 4>{s) = cs^~^ for some c > 0 and 9 € [0,1) such that the KL inequality 
holds 


(j)' {\'tf{x) — i/>(x)|)dist(0, d-ipix)) > 1, for any x G Af D dom(3'0) and 'if{x) 'f’ix), (3.9) 

where dom(9i/)) = {x : dtp{x) 0} and dist(0,9'!/(x)) = min{||y|| : y € df:{x)}. 

The KL property was introduced by Lojasiewicz [16] on real analytic functions, for which the term with 
9 G [|, 1) in (3.8) is bounded around any critical point x. Kurdyka extended this property to functions on 
the o-minimal structure in [14]. Recently, the KL inequality (3.9) was extended to nonsmooth sub-analytic 
functions [3]. The works [1,23] give a lot of concrete examples that own the property. The function F is one 
of their examples and thus has the KL property. 

Theorem 3.4 (Global sequence convergence). If A is a block-nondegenerate limit point of the sequence 
{A^}fe>i generated from Algorithm 2, then A is a critical point of (1.2), and 

lim = A. (3.10) 

fc—>-oo 


Proof. From Theorem 2.9, we have the criticality of A, so we only need to show (3.10). Note that F(A*) 
is nondecreasing with repsect to k and thus converges to F{A). We assume F{A) > F'(A^), Vk. Otherwise, 
if for some ko, F{A) = F{A'‘°), we must have A* = A^°, Vfc > fco- 

Since F has the KL property, then in a neighborhood Af{A, p), there exists (j){s) = cs^~^ for some c > 0 
and 9 G [0,1) such that 

/.'(lU(A) -F(A)l)dist(0,aU(A)) > 1, for any AgN{A,p) and F{A) F{A). (3.11) 


If necessary, taking a smaller p, we assume 


N 


AGA'iA,p)^J2 


l2\F{A)-F{A)\ 


|A-AJIf <<5, 


n—1 


where 6 and Un’s are defined in the same way as those in the proof of Lemma 3.2. Note that there is a 
constant L such that 


1|Va„/(A) - Va„/(A) 1|;^ < L1|A - AJlf, VA, A e O, Vn, 
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(3.12) 



where 


O — {A: A—(Ai,..., Ajv), A„ € j Vn}. 

Since A is a limit point, there is a subsequence convergent to A. Hence, we can choose a 

sufficiently large fco € /C such that A^^ is sufficiently close to A. Without loss of generality, we assume A° 
(otherwise set A^° as a new starting point) is sufficiently close to A such that A° € A/'(A, p) and 

||Ai - All;^ + ||Ai - A0||;^ + —^{F{A.) - F(A0)) < p, 

a 

which can be guaranteed from Lemma 3.2 and where a is the same as that in Lemma 3.2. 

Assume A^ € A/’(A, p) for k = 0,1,... ,K. We go to show A*^+^ € A/’(A, p) and thus A^ G A/’(A, p), \/k 
by induction. For any n = 1,..., A, from the optimality of A^ on problem maxA„ A„, AJ“,^), it 

holds that 

oeaA„F(Af<„,A?-„i) 

4^0 G VA„/(Af<„, A^-„i) + dgn{Ai) 

Va„/(A'=) - VA„/(Af<„, Af-„i) G Va„/(A'=) + dgn{At)- 

Hence, 

N 

dist(0,aF(A'=)) < ^ ||Va„/(A'=) - Va„/(A?<„, A?-„1)||;^ 

n—1 

(3.12) 

< AL||A'= - A'^-iIIf. (3.13) 

Letting Fk = F{A) — F(A^) and (j)k = 4’{Fk), we have 


(l>k — <('fe+l 

>(l^'iFk){Fk-Fk+i) 

^ a|lA'=+i-A'=||2. 

-NL\\A>^ - A>^-^f 

which implies 


(from concavity of 4>) 

(from KL inequality. Lemma 3.2, and (3.13)) 


^||Afc+i _ < iVL||A'= - A'=-i||f ((/)fc - ^k+i) 

^ V^IIA'^+i - A'^IIf < ^JNL\\A>^-A>^-^F{<Pk-<|)k+l) 

^ V^llA'^+i - A'^ll^ < ^IIA'^ - A'=-i||^ + - <(.fe+i)- 

2 2y/a 

Summing the above inequality from A: = 1 to A and simplifying the summation gives 

i\j r 

V IIA'^+i - A'^llf <||Ai - A0||^ +-(<(,1 - cj)K+i) 

a 

k^l 

— I|A^ — A^lli^’ H- (jjQ, 

a 


(3.14) 
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and thus 


K 

||A^+i - A||f < ^ IIA'^+i - A'^IIf + ||A1 - A||j^ 

k^l 

<||Al - All;^ + ||Al - A0||;^ + —00 < P- 

a 

Therefore, A*-+^ € Af{A,p) and A^ G N'{A,p), Vk, by induction. Hence, (3.14) holds for all K, and letting 
A —>• oo, we conclude that is a Cauchy sequence and converges. Since A is a limit point, then 

A^ —> A as fc —)■ oo. This completes the proof. □ 

As long as the starting point is sufhciently close to any block-nondegenerate local maximizer, Algorithm 
2 will yield an iterate sequence convergent to a local maximizer as summarized below. 

Theorem 3.5 (Convergence to local minimizer). Assume Algorithm 2 starts from any point A° that 
is sufficiently close to one block-nondegenerate local maximizer A* of F{A). Then the seguence {A^}fe>i 
converges to a local maximizer. 

Proof. First, note that if some A^° is sufficiently close to A* and F{A’^°) = F{A*), then A^“ must also 
be a local maximizer and block-nondegenerate. In this case, A^ = A*'’, Vfc > fco- Hence, without loss of 
generality, we can assume F(A^) < F{A*), Vfc. Secondly, note that in the proof of Theorem 3.4, we only use 
F’(A^) < F{A) and the sufficient closeness of A° to A to show {A^}fc>i to be a Cauchy sequence. Therefore, 
repeating the same arguments, we can show that if A° is sufficiently close to A*, then {A^}fe>i is a Cauchy 
sequence and thus converges to a block-nondegenerate point A near A*. From Theorem 2.9, it follows that A 
is a critical point. We claim F{A) = F{A*), i.e., A is a local maximizer. If otherwise F{A) < F{A*), then 
by the KL inequality, it holds that 0'(F'(A*) — F'(A))dist(0,9F"(A)) > I, which contradicts to 0 G dF{A). 
Hence, F{A) = F{A*). This completes the proof. □ 

From Theorem 3.5, we can easily get the following local convergence to a globally optimal solution. 

Theorem 3.6 (Global optimality). Assume Algorithm 2 starts from any point A° that is sufficiently 
close to one block-nondegenerate globally optimal solution A* of (1.2). Then the seguence {A^}fc>i converges 
to a globally optimal solution. 

4. Proof of the main theorem. In this section, we analyze the convergence of the original HOOI 
method by relating its iterate sequence to that of the greedy HOOI method. Because any solution to 
each subproblem of the original HOOI method is still a solution after arbitrary rotation, we do not hope 
to establish convergence on the iterate sequence {A^}fc>i itself. Instead, we show the convergence of the 
projection matrix sequence {A^(A^)^}fe>i. 

First note that 


AT X 1 A]'^... X jv A)(^|||^ — ^at, at x i (AiAj'^)... x (Aj^Ak)) ■ 


(4.1) 


We also need the following two lemmas. 

Lemma 4.1. If AA^ = AA^ and A is a critical point of (1.2), then A is also a critical point. 
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Proof. Since A is a critical point of (1.2), it holds that G„G)[A„ = A„A)[G„G,[A„ and A,[A„ = I 
for all n. Note that AA^ = AA^ implies G„G^ = G„G)[. Hence, for any n. 


G„G:rA„Al = G„GIA„AI = A„AIG„GIA„AI = A„AlG„G:rA„A 


T 

n • 


Multiplying A„ to both sides and noting A,[A„ = I gives 

G G^ A — A A^G G^ A Vu 


and thus A is a critical point. □ 

Lemma 4.2. Let {A^}fe>i be the sequence generated by the original HOOI method and assume it has a 
block-nondegenerate limit point A. If for some ko, _F(A^“) = F{A), then there is an integer K > ko such 
that A'^iA'^y = A^iA^y, V/c > K. 

Proof Because _F(A^) is nondecreasing and upper bounded, we have limfc_>oo = F{A) and 

FiA’^) < F{A), so if F{Ay = F{A), then F(A'=) = F(A), Vfc > ko- 

Since A is a limit point, there must be an integer K > ko such that A^ is sufficiently close to A and 
A^ is block-nondegenerate. Hence, Gf^ has a unique dominant ri-dimensional left singular subspace. Note 

max \\AjGyy = f2aUGy = FiA) = \\iAyyGy\y 

AiGO/ixt-i “ 

Therefore, Af" and A^^^ both span the dominant ri-dimensional left singular subspace of Gf^, and 
thus Af‘"''^(Af‘+^)^ = Af-(Af^)^. Using (4.1), we can repeat the arguments to have A^+^(A^+^)^ = 
A^(Af)T, Vn, i.e., A^+^ = A^. Now starting from A^+^ and repeating the arguments, we have the 
desired result. □ 

By Lemma 4.2, without loss of generality, we assume F{A^) < F{A), Vfc in the remaining analysis. 
With Lemmas 4.1 and 4.2, we are now ready to prove the main theorem. 

Proof. [Proof of Theorem 1.1] 

Part (i): Since A is a limit point of {A^}, there is a subsequence {A^}k^K convergent to A, and there 
is fco € ^ such that A^^ is sufficiently close to A. Without loss of generality, we assume that A° is sufficiently 
close to A because otherwise we can set A^° as a new starting point and the convergence of {A^}fc>i is 
equivalent to that of {A^}fc>fe(,. Let {A^} be the sequence generated by the greedy HOOI method starting 
from A° = A°. We go to show that if A° is sufficiently close to A, then 

Afc(A'=)T = a'=(A'=)^, Vfc > 1. (4.2) 


Repeating the same arguments in the proof of Lemma 3.2, we have that if A° is sufficiently close 
to A, then A^ is also sufficiently close to A. Note that when A° is sufficiently close to A, it is block- 
nondegenerate and CTri(G5) > (Tr.j+i(G]’). Hence, A} and A} both span the dominant ri-dimensional left 
singular subspace of G? and thus A}(A})^ = A[[(A})^. Since both A° and A^ are sufficiently close to A, 
we have crra)^®) > cri. 2 +i(G 2 ). Note G 2 (G 2 )^ = 63 ( 62 )^. Hence, A 2 and A 2 both span the dominant 
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r 2 -dimensional left singular subspace of 62 ( 02 )^ and thus A 2 (A 2 )^ = A 2 (A 2 )^. Repeating the above 
arguments, we have A^(A^)^ = A^(A^)^, Vn, i.e., A^(A^)^ = A^(A^)^. 

Assume that for some integer A > 1, it holds A^(A*)^ = A^(A^)^ and A^ G Af{A,p) for all k < K, 
where p is sufficiently small and plays the same role as that in the proof of Theorem 3.4. From (4.1), it follows 
that F(A^) = F(A^) < F{A), Vk > K. Through the same arguments as those in the proof of Theorem 3.4, 
we have A-^^^ G Af{A,p), and thus A'^+^(A^+^)^ = A^+^(A^+^)^ by the above arguments that show 
A^(A^)^ = A^(A^)^. By induction, we have the result in (4.2). 

Taking another subsequence if necessary, we can assume converging to A and thus AA^ = 

AA^ by (4.2). Note that the block-nondegeneracy of A is equivalent to that of A. Hence, A is block- 
nondegenerate and is a critical point and a block-wise maximizer by Theorem 2.9, and A^ converges to A 
by Theorem 3.4. Therefore, A*(A^)^ converges to AA^. From Lemma 4.1, we have that A is a critical 
point of ( 1 . 2 ), and from (4.1), A is a block-wise maximizer. This completes the proof of part (i). 

Part (ii): Let {A^}fe>i be the sequence generated by the greedy HOOI method starting from = A*^. 
From Theorem 3.5, it follows that A^ converges to a local maximizer A of (1.2). In addition, by similar 
arguments as those in the proof of part (i), we can show that (4.2) still holds. Hence, A^(A^)^ converges 
to AA^, and this completes the proof. □ 

5. Conclusions. We proposed a greedy HOOI method and established its iterate sequence convergence 
by assuming existence of a block-nondegenerate limit point. Through relating the iterates by the original 
HOOI to those by the greedy HOOI, we have shown the global convergence of the HOOI method, for the 
first time. In addition, if the starting point is sufficiently close to any block-nondegenerate locally optimal 
point, we showed that the original HOOI could guarantee convergence to a locally optimal solution. 
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